首页> 外文OA文献 >Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition
【2h】

Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition

机译:不确定性补偿的自适应多模态融合及其在视听语音识别中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audiovisual speech recognition, where multiple streams of complementary time-evolving features are integrated. For such applications, provided that the measurement noise uncertainty for each feature stream can be estimated, the proposed framework leads to highly adaptive multimodal fusion rules which are easy and efficient to implement. Our technique is widely applicable and can be transparently integrated with either synchronous or asynchronous multimodal sequence integration architectures.We further show that multimodal fusion methods relying on stream weights can naturally emerge from our scheme under certain assumptions; this connection provides valuable insights into the adaptivity properties of our multimodal uncertainty compensation approach.We show how these ideas can be practically applied for audiovisual speech recognition. In this context, we propose improved techniques for person-independent visual feature extraction and uncertainty estimation with active appearance models, and also discuss how enhanced audio features along with their uncertainty estimates can be effectively computed. We demonstrate the efficacy of our approach in audiovisual speech recognition experiments on the CUAVE database using either synchronous or asynchronous multimodal integration models. © 2009 IEEE.
机译:尽管特征测量的准确性在很大程度上取决于不断变化的环境条件,但迄今为止在模式识别任务中研究这一事实的后果却鲜为人知。在本文中,我们明确考虑了特征测量的不确定性,并展示了应如何调整多峰分类和学习规则以补偿其影响。我们的方法在多模态融合场景(例如视听语音识别)中特别富有成果,在该场景中,多个互补的随时间变化的特征流被集成在一起。对于此类应用程序,只要可以估计每个特征流的测量噪声不确定性,则所提出的框架将导致高度自适应的多模态融合规则,该规则易于实现且高效。我们的技术具有广泛的适用性,可以与同步或异步多峰序列集成架构透明地集成。这种联系为我们的多模式不确定性补偿方法的适应性特性提供了宝贵的见解。我们展示了如何将这些想法实际应用于视听语音识别。在这种情况下,我们提出了用于具有活动外观模型的独立于人的视觉特征提取和不确定性估计的改进技术,并讨论了如何有效地计算增强的音频特征及其不确定性估计。我们使用同步或异步多模式集成模型在CUAVE数据库上的视听语音识别实验中证明了我们的方法的有效性。 ©2009 IEEE。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号